AITopics

Country:

Asia > China (0.06)
Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.43)

Neural Information Processing SystemsFeb-16-2026, 05:42:20 GMT

Supplementary Material for " Differentiable Registration of Images and LiDAR Point Clouds with V oxelPoint-to-Pixel Matching " Junsheng Zhou 1 Baorui Ma

We provide the detailed network structure of pixel branch in Figure 1.(c).

artificial intelligence, dataset, machine learning, (16 more...)

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
Asia > China > Beijing > Beijing (0.05)
North America > United States (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Neural Information Processing SystemsOct-9-2025, 03:50:56 GMT

a61023ce36d21010f1423304f8ec49af-Supplemental-Conference.pdf

artificial intelligence, dataset, nyu-depth-v2 dataset, (11 more...)

Country:

Asia > China (0.06)
Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence (0.77)

Neural Information Processing SystemsOct-9-2025, 03:06:32 GMT

Supplementary Material for " Differentiable Registration of Images and LiDAR Point Clouds with V oxelPoint-to-Pixel Matching " Junsheng Zhou 1 Baorui Ma

We provide the detailed network structure of pixel branch in Figure 1.(c).

artificial intelligence, dataset, machine learning, (16 more...)

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
Asia > China > Beijing > Beijing (0.05)
North America > United States (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

arXiv.org Artificial IntelligenceMay-27-2025

LTDA-Drive: LLMs-guided Generative Models based Long-tail Data Augmentation for Autonomous Driving

Yurt, Mahmut, Ye, Xin, Ma, Yunsheng, Luo, Jingru, Mallik, Abhirup, Pauly, John, Yaman, Burhaneddin, Ren, Liu

3D perception plays an essential role for improving the safety and performance of autonomous driving. Yet, existing models trained on real-world datasets, which naturally exhibit long-tail distributions, tend to underperform on rare and safety-critical, vulnerable classes, such as pedestrians and cyclists. Existing studies on reweighting and resampling techniques struggle with the scarcity and limited diversity within tail classes. To address these limitations, we introduce LTDA-Drive, a novel LLM-guided data augmentation framework designed to synthesize diverse, high-quality long-tail samples. LTDA-Drive replaces head-class objects in driving scenes with tail-class objects through a three-stage process: (1) text-guided diffusion models remove head-class objects, (2) generative models insert instances of the tail classes, and (3) an LLM agent filters out low-quality synthesized images. Experiments conducted on the KITTI dataset show that LTDA-Drive significantly improves tail-class detection, achieving 34.75\% improvement for rare classes over counterpart methods. These results further highlight the effectiveness of LTDA-Drive in tackling long-tail challenges by generating high-quality and diverse data.

large language model, machine learning, natural language, (16 more...)

2505.18198

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.63)
Information Technology > Robotics & Automation (0.63)
Automobiles & Trucks (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ramasamy, Mathanesh Vellingiri, Kurniasalim, Dimas Rizky

Road Segmentation for ADAS/AD Applications

arXiv.org Artificial IntelligenceMay-20-2025

--Accurate road segmentation is essential for autonomous driving and ADAS, enabling effective navigation in complex environments. This study examines how model architecture and dataset choice affect segmentation by training a modified VGG-16 on the Comma10k dataset and a modified U-Net on the KITTI Road dataset. Both models achieved high accuracy, with cross-dataset testing showing VGG-16 outperforming U-Net, despite U-Net being trained for more epochs. We analyze model performance using metrics such as F1-score, mIoU, and precision, discussing how architecture and dataset impact results. Road image segmentation plays a crucial role in applications such as autonomous driving (AD), advanced driver assistance systems (ADAS), traffic monitoring, and smart city development.

artificial intelligence, deep learning, machine learning, (17 more...)

2505.12206

Country:

Europe > Sweden (0.15)
Europe > Germany (0.14)

Genre: Research Report (0.70)

Industry:

Automobiles & Trucks (0.75)
Transportation > Ground > Road (0.55)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.70)

arXiv.org Artificial IntelligenceMar-11-2025

GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats

Deng, Kai, Yang, Jian, Wang, Shenlong, Xie, Jin

Tracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first NeRF/3DGS-based SLAM framework for kilometer-scale outdoor environments, as demonstrated on the KITTI and KITTI 360 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.

dataset, representation, sequence, (15 more...)

2503.08071

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Marcus, Richard, Vogel, Christian, Jatzkowski, Inga, Knoop, Niklas, Stamminger, Marc

Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios

arXiv.org Artificial IntelligenceFeb-20-2025

An important factor in advancing autonomous driving systems is simulation. Yet, there is rather small progress for transferability between the virtual and real world. We revisit this problem for 3D object detection on LiDAR point clouds and propose a dataset generation pipeline based on the CARLA simulator. Utilizing domain randomization strategies and careful modeling, we are able to train an object detector on the synthetic data and demonstrate strong generalization capabilities to the KITTI dataset. Furthermore, we compare different virtual sensor variants to gather insights, which sensor attributes can be responsible for the prevalent domain gap. Finally, fine-tuning with a small portion of real data almost matches the baseline and with the full training set slightly surpasses it.

dataset, point cloud, vehicle, (13 more...)

2502.15076

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
North America > United States (0.04)
Europe > Italy > Lazio > Rome (0.04)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Leitenstern, Maximilian, Alten, Marko, Bolea-Schaser, Christian, Kulmer, Dominik, Weinmann, Marcel, Lienkamp, Markus

FlexCloud: Direct, Modular Georeferencing and Drift-Correction of Point Cloud Maps

arXiv.org Artificial IntelligenceFeb-1-2025

Current software stacks for real-world applications of autonomous driving leverage map information to ensure reliable localization, path planning, and motion prediction. An important field of research is the generation of point cloud maps, referring to the topic of simultaneous localization and mapping (SLAM). As most recent developments do not include global position data, the resulting point cloud maps suffer from internal distortion and missing georeferencing, preventing their use for map-based localization approaches. Therefore, we propose FlexCloud for an automatic georeferencing of point cloud maps created from SLAM. Our approach is designed to work modularly with different SLAM methods, utilizing only the generated local point cloud map and its odometry. Using the corresponding GNSS positions enables direct georeferencing without additional control points. By leveraging a 3D rubber-sheet transformation, we can correct distortions within the map caused by long-term drift while maintaining its structure. Our approach enables the creation of consistent, globally referenced point cloud maps from data collected by a mobile mapping system (MMS). The source code of our work is available at https://github.com/TUMFTM/FlexCloud.

artificial intelligence, odometry trajectory, trajectory, (14 more...)

2502.00395

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Germany > Baden-Württemberg (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Automobiles & Trucks (0.49)
Transportation > Ground > Road (0.35)
Information Technology > Robotics & Automation (0.35)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)

Velasco-Sánchez, Edison P., Recalde, Luis F., Li, Guanrui, Candelas-Herias, Francisco A., Puente-Mendez, Santiago T., Torres-Medina, Fernando

DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions

arXiv.org Artificial IntelligenceOct-17-2024

This paper reports on a novel method for LiDAR odometry estimation, which completely parameterizes the system with dual quaternions. To accomplish this, the features derived from the point cloud, including edges, surfaces, and Stable Triangle Descriptor (STD), along with the optimization problem, are expressed in the dual quaternion set. This approach enables the direct combination of translation and orientation errors via dual quaternion operations, greatly enhancing pose estimation, as demonstrated in comparative experiments against other state-of-the-art methods. Our approach reduced drift error compared to other LiDAR-only-odometry methods, especially in scenarios with sharp curves and aggressive movements with large angular displacement. DualQuat-LOAM is benchmarked against several public datasets. In the KITTI dataset it has a translation and rotation error of 0.79% and 0.0039{\deg}/m, with an average run time of 53 ms.

artificial intelligence, descriptor, machine learning, (19 more...)